An evaluation of Naive Bayes variants in content-based learning for spam filtering
نویسنده
چکیده
We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two current variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two variants of Naive Bayes learning, SpamAssassin and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surprisingly, we found that the performance of these systems was remarkably similar and that the extended systems have significant weaknesses which are not apparent for the simpler Naive Bayes learner. The simpler Naive Bayes learner, SpamBayes, also offers the most stable performance in that it deteriorates least over time. Overall, SpamBayes should be preferred over the more complex variants. c © 2005 Kluwer Academic Publishers. Printed in the Netherlands. spam-journal.tex; 5/09/2005; 21:11; p.1 2 Alexander K. Seewald
منابع مشابه
Adaptive Spam Filtering Using Only Naive Bayes Text Classifiers
In the past few years, machine learning and in particular simple Naive Bayes classifiers have proven their value in filtering spam emails. We hereby put Naive Bayes filters to the test, against potentially more elaborate spam filters that will participate in the ceas 2008 challenge. For this purpose, we use the variants of Naive Bayes that have proven more effective in our earlier studies. Furt...
متن کاملNaive Bayes Spam Filtering Using Word-Position-Based Attributes
This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using wordposition-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms o...
متن کاملNaive Bayes spam filtering using word-position-based attributes and length-sensitive classification thresholds
This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word-position-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms ...
متن کاملNaive Bayes Spam Filtering Using Word Position Attributes
This paper explores the use of the naive Bayes classifier as the basis for personalized spam filters. Various machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word position based attribute vectors gives very good results when tested on several publicly available corpora. The effect of various forms ...
متن کاملNot So Naive Online Bayesian Spam Filter
Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers changing tactics. Naı̈ve Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is eas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 11 شماره
صفحات -
تاریخ انتشار 2007